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ABSTRACT 

We consider a streaming data model in which n sensors ob- 
serve individual streams of data, presented in a turnstile 
model. Our goal is to analyze the singular value decompo- 
sition (SVD) of the matrix of data defined implicitly by the 
stream of updates. Each column i of the data matrix is given 
by the stream of updates seen at sensor i. Our approach is 
to sketch each column of the matrix, forming a "sketch ma- 
trix" Y , and then to compute the SVD of the sketch matrix. 
We show that the singular values and right singular vectors 
of Y are close to those of X, with small relative error. We 
also believe that this bound is of independent interest in 
non-streaming and non-distributed data collection settings. 

Assuming that the data matrix X is of size N x n, then 
with m linear measurements of each column of X, we ob- 
tain a smaller matrix Y with dimensions m x n. If m = 
0(fce _2 (log(l/e) + log(l/<5)), where k denotes the rank of 
X, then with probability at least 1 — 5, the singular values 
a'j of Y satisfy the following relative error result 
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as compared to the singular values aj of the original matrix 
X. Furthermore, the right singular vectors v'j of Y satisfy 
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as compared to the right singular vectors Vj of X. We ap- 
ply this result to obtain a streaming graph algorithm to 
approximate the eigenvalues and eigenvectors of the graph 
Laplacian in the case where the graph has low rank (many 
connected components) . 



Copyright statement goes here. 



1. INTRODUCTION 

Consider a collection of data arranged in a matrix X of 
size N x n. Each column represents a length- N signal (or 
image, frequency counts of terms in a particular document, 
etc.) and there are n such observations. The singular value 
decomposition (SVD) of X, 

X = UTV T , 

carries important information about the structure of the 
data set, especially when the rank k of X is small. In par- 
ticular, the columns of U (known as the left singular vectors 
of X) span the principal directions of the data set and can 
be used as basis vectors for building up typical signals, and 
the diagonal entries of E (known as the singular values of 
X) reflect the energy of the data set in each of these direc- 
tions. The extraction of these features is commonly known 
as Principal Component Analysis (PC A), and PC A is a fun- 
damental and commonly used tool in data analysis and com- 
pression. This exact same process can be viewed through a 
slightly different lens when one imagines the columns of X 
as independent realizations of a length-iV random vector x. 
Computing the left singular vectors of X is equivalent to 
computing the eigenvectors of XX T , which (up to rescal- 
ing) is the N x N sample covariance matrix of the data. 
In this context, PCA is also known as the Karhunen-Loeve 
(KL) Transform, and the KL Transform is a fundamental 
and commonly used tool in statistics. 

There are in fact a number of applications where the right 
singular vectors V of a data matrix X are more important, 
or equivalently, where the eigenvectors of X T X carry the 
structure of interest. For example, the product El/ T gives 
a low-dimensional embedding of the data set that preserves 
distances and angles between the n data vectors. This em- 
bedding can be used for clustering or categorizing the sig- 
nals [6,27]; for example, this is used for comparing docu- 
ments in latent semantic analysis. The right singular vec- 
tors of X can also be viewed as the result of applying the KL 
Transform to the rows, rather than the columns, of X. In 
this sense, the columns of V describe the inter-signal (rather 
than intra-signal) statistical correlations. In cases where the 
column index corresponds to a distinct sensor position, or a 
vertex in a graph, etc., this correlation structure can carry 
important structural information [8,31]. 



Unfortunately, many of the applications in which we seek the 
right singular vectors of X (equivalently, the eigenvectors 
of X T X) are those in which the data is simply too large, 



too distributed, or generated too quickly for us to store the 
data or to process it efficiently in one, centralized location. 
There are, however, settings in which the data sets — while 
large — have low intrinsic dimension or are of low rank. Let 
us suppose that the length of each data vector N is much 
larger than the number of observations n, and suppose that 
X has rank k < n. The data may or may not be generated 
in a dynamic, streaming fashion and it may or may not be 
collected in a distributed fashion amongst n sensors. 

We wish to design a joint observation process (which can 
be distributed amongst n sensors) that maintains a "sketch" 
of the data stream and a reconstruction process that, at a 
central location, reconstructs not an approximation of the 
original data, but rather a good approximation to the sin- 
gular values (jj and the (right) singular vectors Vj of the 
original matrix X. The sketch of the data stream should 
be a linear, non-adaptive procedure, one that is efficient to 
update, and one that uses as few observations of the data 
matrix as possible so that as little communication as possible 
is required from the sensors to the central processing entity. 
Because the procedure is linear and non-adaptive, we can 
represent the sketch as a matrix-matrix product <&X = Y , 
where the observation matrix $ is of size m x N and the 
sketch Y is of size m x n. We want m as small as possible. 
From the sketch matrix Y, we want to produce estimates 
a'j of the k non-zero singular values and estimates v'j of the 
associated (right) singular vectors of X such that 

and 

hi - Wj ||a < n, 

where 7 is a function of the smallest gap between the singular 
values of X. 

Most of the current work on low-rank matrix approxima- 
tions, robust PC A [7], rank-revealing QR factorizations (see 
[20] and the references therein for a comprehensive survey), 
etc. has focused on obtaining a good approximation X to 
the original data matrix X, albeit one that is parsimonious. 
Some of the measurement schemes [7] sketch both the row 
and column space of X, collecting measurements Y = 3>X$ T , 
and some sketch the column space only so as to derive a few 
orthogonal vectors that span the column space [20]. These 
results are of the form 

\\X-X\\ X < (l + e)||X]|* 

for some norm X (typically, the Frobenius norm, but others 
as well). Our goal is an approximation to the singular vec- 
tors and singular values of X themselves, directly, and while 
one could apply standard perturbation theory techniques to 
compare the singular vectors of an approximation X to those 
of the original data X, the error guarantees would be rather 
poor. Furthermore, while one could ask about preserving 
the subspaces spanned by the singular vectors, there are 
many applications from PCA to data clustering, image seg- 
mentation, graph embedding, and modal analysis in which 
the individual singular vectors are critical for data analysis 
tasks, and data reconstruction is not necessarily required. 

Our algorithmic approach is straightforward. We sketch one 
side of the Nxn data matrix X, maintaining a sketch matrix 



Y = $X of size mxn. (The fact that the sketch is one-sided 
allows it to be computed sensor-by-sensor in distributed data 
collection settings.) We then compute the SVD of the sketch 
matrix Y, using standard (iterative) SVD algorithms. Our 
analysis is quite different from that of most randomized lin- 
ear algebra methods. We assume that the sketching ma- 
trix $ is randomly generated and satisfies the distributional 
Johnson-Lindenstrauss (JL) property (see Definition 1) so 
that with high probability it acts as a near isometry on the 
column span of X, and we then exploit relative error (as op- 
posed to absolute error) perturbation analysis for determin- 
istic (as opposed to random) matrices to obtain our results. 
As detailed in our main result, Theorem 1, we can obtain 
accurate relative estimates for the singular values, and in 
some cases we can obtain accurate estimates for the sin- 
gular vectors as well. However, we struggle to achieve high 
accuracy in the singular vectors when the corresponding sin- 
gular values are close. This is a consequence of well-studied 
perturbation theory and seems inherent in our approach. 

One major application of our work is to determining struc- 
tural graph properties of streaming graph data, albeit for 
low-rank graphs (ones with many connected components). 
Recent work on the structure and evolution of online social 
networks [22] suggests that a significant fraction of vertices 
in such networks participate in isolated communities, "small 
groups who interact with one another but not with the net- 
work at large." 

In Section 2, we set the stage for our mathematical problem 
and in Section 3, we outline the related work, including an 
overview of the relative error perturbation techniques from 
linear algebra that we will use. In Section 4, we present our 
main result which we apply, in Section 5, to the spectral 
analysis of streaming graphs. 

2. PROBLEM SETUP 

Let X denote the Nxn real-valued data matrix. For exam- 
ple, one may envision that each column represents a length- 
N time series signal collected from one of n sensors. Let 
us assume that N > n, and that rank(X) — k < n. We 
write the truncated SVD of X as X = Ux ExVx , where un- 
like the full SVD, Ux is N x k, Ex = diag(<ri, . . . , a k ) with 
cti > ■ ■ ■ > as, > 0, and Vx is n x k. Our goal in this paper 
is to estimate the singular values Ex and the right singular 
vectors Vx from a low-dimensional sketch obtained by left- 
multiplying X with a compressive matrix. In particular, we 
let $ denote a sketching matrix of dimension m x N, and 
we denote the sketch of X by Y — &X. We are specifically 
interested in cases where m < N and thus Y is a shorter 
matrix than X. We also note that the sketching matrix $ 
can be applied individually to each column of X, and these 
sketched columns can be concatenated to form the mxn 
matrix Y . In other words, the sketching can be performed 
sensor- by-sensor. 

Our algorithm for estimating Ex and Vx from Y is ex- 
plained in Section 4.1. This algorithm is very simple and 
is based on the idea that under a suitable choice of 
the singular values and right singular vectors of Y will ap- 
proximate the singular values and right singular vectors of 
X. In order to state our results, we write the truncated 
SVD of Y as Y = Uy^yVf, where U Y is m x k, Ey = 



diag(cr( , . . . ,o' k ) with a[ > • • • > a' k > 0, and W is n x fc. 
(We will be interested in cases where m > fc, and typi- 
cally Y will have rank fc just like X.) It will also be useful 
for us to write the eigendecompositions of X T X and Y T Y 
as X T X = VxAxVj and Y T Y = V Y A Y V?, respectively, 
where Ax = diag(Ai, . . . , Afc) and Ay = diag(A' 1 , . . . , A' fc ). 
(Although it is not common practice we will use the "trun- 
cated" eigendecomposition so that Vx and Vy are both of 
dimension n x fc and Ax and Ay are both of dimension 
fc x fc; when fc = n we will have the usual eigendecomposi- 
tion.) Note that Aj = cr| and Aj = of for j = 1, . . . , fc. 

To ensure that the spectral information about X is preserved 
in the sketched matrix Y = $X, we rely on randomness to 
construct the sketching matrix $. Any random distribution 
that can be used to construct a Johnson-Lindenstrauss (JL) 
embedding can be used to generate $. 

Definition 1. An m x N random matrix $ is said to 
satisfy the distributional JL property if for any fixed x £ 
R N , and any < e < 1, 

Pr[\\\$xf 2 -\\x\\ 2 2 \>e\\xf 2 ] <2e~ m ^\ 

where /(e) > is a constant depending only on e. 

For most random matrices satisfying the distributional JL 
property, the functional dependence on e, /(e), is quadratic 
in e as e — > 0. For compactness, we suppress this except 
where necessary for quantifying the number of measure- 
ments or the run time of our algorithm. There are a va- 
riety of random matrix constructions known to possess the 
distributional JL property. Notably, random matrices pop- 
ulated with independent and identically distributed (i.i.d.) 
subgaussian entries will possess this property [12]. Subgaus- 
sian distributions include suitably scaled Gaussian and ±1 
Bernoulli random variables. Other examples of non-i.i.d. JL 
matrices and discussions of the function /(e) are contained in 
works such as [11, 21]. There are many papers that address 
the sparsity of a JL matrix, the speed of its application to 
a vector, the minimum number of rows it must possess, the 
minimum amount of randomness necessary to generate such 
a matrix, etc. For our results, a random matrix satisfying 
the distributional JL property is sufficient and, depending 
on the particular application (streaming versus static data), 
we want either fast update times (i.e., a sparse transform) 
or a fast transform. We appeal to a long line of work in as- 
sessing the qualities of such transforms and in constructing 
them, either randomly or deterministically. 

Finally, we note that a primary objective of this work is to 
quantify the amount of perturbation of the right singular 
vectors of X under the random measurement operator <E>. 
For j = 1, . . . , fc, let us denote the jth column of Vx as Vj 
and the jth column of Vy as v'j. Our bounds concern the 
quantity \\vj — v'j 1 1 2 . However, we note that the right singular 
vectors Vj and v'j are each unique only up to multiplication 
by —1. Thus, without loss of generality, we will assume that 
the sign of each v'j is chosen so that it is positively correlated 
with Vj . That is, we will assume for each j that (vj ,v'j) > 0. 

3. RELATED WORK 



In order to quantify the amount of change in the singular val- 
ues and the right singular vectors, we approach this problem 
from the matrix perturbation theoretic perspective. To see 
the connection to matrix perturbation theory, let us write 

Y T Y = X T $> T $>X = VxT.xUx$ T $UxZxVx- 

Defining A$ := <I> T <I> — /, Y T Y becomes 

Y T Y = Vx Ex Ul (I + A«) U x Ex V$ , 

= VxT&Vx + VxT-xUl A*U x ExV%. 

Given this equation, we can think of the symmetric matrix 
Y T Y as being the summation of some original symmetric 
matrix 

A := VxT? x V$ 

and a perturbation matrix 

E := VxY.xUl^Ux^xVZ- 

Roughly speaking, when E is small in some sense, it would 
be reasonable to expect Y T Y to have approximately the 
same spectral information as A. Thus we can think about 
our problem as quantifying the amount of change between 
the eigenvalues and eigenvectors of A (which equal the squared 
singular values and the right singular vectors of X) and those 
of Y T Y under the additive perturbation E. 

In the following sections, we briefly review some of the im- 
portant results in the matrix perturbation theory literature 
and also discuss the connection of our problem to the si- 
multaneous iteration method, an important algorithm for 
computing the eigendecomposition of a matrix. 

3.1 Absolute Bounds 

There is an extensive literature in the field of matrix per- 
turbation theory quantifying the amount of change in the 
eigenvalues and eigenvectors of a symmetric n x n matrix A 
when it undergoes an additive perturbation. A perturbed 
matrix, A, may be written in the form of A — A + E, where 
E is the perturbation matrix that is being added to A. It 
is well-known that the eigenvalues of A and those of A will 
be close to one another when the amount of perturbation E 
is small (typically with respect to the 2-norm of E). Let us 
denote the jth largest eigenvalues of A and A as Aj and Aj, 
respectively. Then, it is known [19] that for j = 1, . . . , n, 

\Xj - Xj\ < \\E\\ 2 . (1) 

Thus we can see that the distance between each perturbed 
eigenvalue and the corresponding original eigenvalue will de- 
pend on the amount of perturbation, i.e., 1 1 1 j 2 - 

To discuss the perturbation in the eigenvectors let us first 
represent the eigenvectors of A as Vj such that Avj = XjVj. 
Similarly, let us define the eigenvectors of A as Vj such that 
Avj = XjVj . It is well known that for general matrices A and 
E, the eigenvectors Vj and Vj may vary drastically even when 
the amount of perturbation is small. In other words, \\vj — 
Vj\\ 2 can be large even when ||-E||2 is small. To see why, let 
us for example look at the case when two eigenvalues, Ai and 
A2, of A are equal to each other. For such a case, we know 
that the eigenvectors corresponding to those eigenvalues, vi 
and i>2, will not be unique: any linear combination of the two 



eigenvectors will also be a valid eigenvector corresponding 
to the same eigenvalue. A perturbation to this matrix will 
generally cause Ai and A2 to split into two eigenvalues Ai 
and A2, each of which will satisfy equation (f) above. If Ai 
and A2 are distinct, the corresponding eigenvectors vi and 
v 2 will now be uniquely identified. Since Wi and v 2 were 
not unique, it is possible for V\ to differ from any particular 
choice of vi and for v 2 to differ from any particular choice 

Of V2- 

The perturbation in the eigenvectors, however, is not com- 
pletely arbitrary. It is known that the angle between the 
space spanned by vi and v 2 and the space spanned by vi 
and v 2 will be small if E is small. To state this result more 
concretely, let us represent the eigendecompositions of A and 
A as 



A = VAV T = (Vi V 2 ) 



and 



VAV 1 



Vi V 2 



Ai 
A 2 



Ai 
A 2 



V? 



V? 



The eigenvalue matrices are such that, 

Ai = diag(Ai, . . . , A p ), A 2 = diag(A p+ i, . . . , A„), 
Ai = diag(Ai, . . . , A p ), A 2 = diag(A p+ i, . . . , A n ) 

for an arbitrarily chosen p £ {2, . . . ,n— 1}. 

It is possible to quantify how close the spaces spanned by 
the columns of Vi and Vi are. In order to provide a measure 
of closeness between the two spaces, the following notion of 
angle matrix was defined in [14]: 

0{X 1 ,X 2 ) := 

arccos ( (X?Xi ) " * X[X 2 (XfX a ) _1 JfJ-Xi (Xf X 1 ) " 



where X^ and X 2 are two matrices of the same dimension 
n x p with n > p and full column rank. The singular values 
of <d(Xi,X 2 ) are the angles required to rotate the space 
spanned by X\ onto that of X 2 . Going back to our notation 
of Vi, V 2 , Vi, and V 2 , it was shown [14] that 



I sin Q(Vi, Vi) 



|V 2 T Vi| 



for any unitarily invariant norm. This fact can be used to 
bound the angle between Vi and Vi. In particular, if k := 
min |A(Ai) - A(A 2 )| > 0, then 

||sme(Fi,T/i)]|i, = ||V 2 T Vi|| F < i!^. 

K 

Once again we note that the above bound relies on the abso- 
lute separation between eigenvalues (in contrast with the rel- 
ative separation, which appears in Section 3.2). The above 
can be further generalized to any invariant norm. Detailed 
discussion on this subject can be found in [14]. 

3.2 Relative Bounds 

The perturbation results discussed above are in terms of 
the absolute differences between eigenvalues. These types 
of results are most useful for ensuring the preservation of 



the largest eigenvalues but least useful for ensuring the rel- 
ative preservation of the smallest eigenvalues; a small abso- 
lute change in a small eigenvalue could actually correspond 
to a large relative change in that eigenvalue. The absolute 
perturbation results are the best we can do when the per- 
turbation to A is completely arbitrary. However, when the 
perturbation exhibits some structure one can do much better 
than what the absolute error bounds indicate. 

Consider the class of perturbations that take the form A — 
A + E = D T AD, where D is non-singular. It was shown [16] 
that in this case a relative perturbation bound for the eigen- 
values is given by 



|A, 



\D T D 



where the factor ||D D — I\\ 2 represents how close D is to 
being an orthonormal matrix. In the extreme case when 
D has orthonormal columns we will have that Aj = Aj as 
expected. 

Similarly, the angle < 9j < ^ between the jth eigenvec- 
tor and its corresponding perturbed eigenvector has been 
shown [16] to satisfy 



\\D T D\\ 2 \\(DD T )- 1 ~I\\ 2 
~ Pj (A)-\\D T D-I\\ 2 



D-I\\ 



provided that pj(A) > \\D T D — I\\ 2 , where the jth relative 
gap, pj(A), of the eigenvalues of A is defined as 

Pi(A) =mm 1 

! " I Ail 

We can see that for the type of perturbation described above 
we obtain a much stronger perturbation result that depends 
on the relative gap between the eigenvalues. There are many 
variants of relative perturbation results that have been pro- 
posed to date [5, 16, 23, 24] that differ from one another de- 
pending on the underlying matrix A (e.g., whether it is a 
symmetric matrix, positive definite matrix, indefinite ma- 
trix, etc.) and also on the type of perturbation. 

3.3 Relation to Simultaneous Iteration 

Our problem also has a close connection to various algo- 
rithms for computing the eigenvalues and eigenvectors of 
symmetric matrices. One algorithm that we shall focus on 
is the simultaneous iteration method [9, 30]. This method is 
best suited for cases when we are interested in computing 
the top few eigenvalues and their corresponding eigenvectors 
and when the underlying matrix is sparse. 

To state the algorithm explicitly, let us set some notation. 
Let A be an n x n positive-semidefinite matrix with eigen- 
decomposition A = VAV T . We let Xj and Vj denote the 
jth eigenvalue of A and its corresponding eigenvector. We 
also assume that the eigenvalues are ordered in descending 
order such that Ai>A2>--->A n >0. We then pick a 
set of trial vectors and denote them as pi,p 2 , . . . ,Pk, where 
the number of trial vectors k depends on how many eigen- 
vectors we wish to compute. The trial vectors can be any 
set of orthonormal vectors such that 



span(pi, . . . ,p k ) P)span(t; fe+ i 



= {0}. 



(2) 



One possible choice of trial vectors is a set of k orthonor- 
mal vectors that are chosen randomly. Let us stack the 
trial vectors into columns of a matrix and denote it as P — 
[pi, . . . ,Pk]- Given this notation the simultaneous iteration 
method is carried out as follows: 

1. W m <- AP 

2. for i = 1, 2, . . . 

(a) Q w i? w <- W (i) via the QR-decomposition 

(b) W { ' +1) <- AQ (l) 

(c) if stopping criterion is not met, set i <— i + 1 and 
go back to step (a), otherwise output Q^>. 

As we can see, the simultaneous iteration method iteratively 
refines the set of eigenvectors of A. Denoting the jth column 
of Q (i) as qf , it is known [30] that 

lkf-^|| 2 = 0(p}), 

where pj — max{Aj+i/A.,, Xj/Xj-i}. From this we can see 
that the rate of convergence of the eigenvectors depends on 
the ratio between the eigenvalues. Put differently, an eigen- 
vector that corresponds to an eigenvalue with a favorable 
eigenvalue ratio pj will converge faster to the true eigenvec- 
tor. This is also similar in spirit to the relative perturbation 
results that we discussed above in that pj provides relative 
measure of the closeness between the eigenvalues, and the 
accuracy between Vj and q^' depends on pj . 

Lastly, let us focus on the very first step in the simultane- 
ous iteration algorithm, in which we multiply the original 
matrix A with k (potentially randomly chosen) vectors. In- 
terestingly, this is similar in spirit to our algorithm, except 
that our data matrix X is not necessarily square or positive- 
semidefinite, and we do not require the rows of <J> to be or- 
thonormalized. We would like to note that — when they are 
used in the simultaneous iteration method — random trial 
vectors are merely chosen as a way to satisfy the condi- 
tion (2). We believe, however, that randomness will also 
help to better preserve the true eigenvectors in the first it- 
eration. 

3.4 Randomized Algorithms for Linear Alge- 
bra 

In a similar vein, there have been a large number of results 
on what we will refer to as randomized algorithms for lin- 
ear algebra. The monograph [25] covers a number of these 
methods and references. 

There are several lines of work that are closely related to 
our results. The first involves the spectral analysis of ran- 
dom matrices and the application to algorithmic tasks such 
as information retrieval and spectral random graph analysis. 
Two representative works are [3, 10] which build random ma- 
trix models (or random perturbations of random matrices) 
for data and graphs and then use those models to find the 
approximate spectral structures in the data (or the SVDs). 
Many of the perturbation results used in these papers fall 
into our category of absolute bounds. 

1 There are a few variants of the simultaneous iteration 
method. Here, we use the algorithm presented in [30]. 



A second line of work is that of robust PCA or low-rank ma- 
trix completion, of which [7] is just one example (there are 
many other such papers). In this problem, the data matrix 
X is either sampled or random linear measurements of the 
form $X$ T are obtained, from which a sparse, low-rank 
approximation X to the original data matrix is produced. 
The primary goal of this line of work is to approximate the 
original data with a parsimonious representation. Our work, 
in contrast, aims to recover or to approximate the parsimo- 
nious representation itself. 

A third line of work is the recovery of principal components 
of a data matrix X from compressive projection measure- 
ments [18,29]. Briefly speaking, in these works, a rectangu- 
lar data matrix X of size p x n is considered where p < n, 
i.e., X has more columns than rows. Each column of X rep- 
resents a data sample, and the objective in these works is 
to compute the left singular vectors of X from compressive 
projections of the columns of X. However, different random 
projection matrices are applied to different columns of X. 
The key differences between the work proposed in [18, 29] 
and our work are that 1) we are interested in the right sin- 
gular vectors of X, 2) the data matrix X in our problem 
is assumed to have more rows than columns, 3) our mea- 
surement matrix is a JL matrix, and 4) we apply the same 
random matrix to every column of X. 

Finally, we emphasize the distinction between subspace ap- 
proximation and the approximation of the singular vectors 
themselves. Feldman et al. [17] give coreset and sketching al- 
gorithms for approximating subspaces spanned by portions 
of the data set. This problem is also similar to the work 
of Halko et al. [20], in which one constructs a basis for an 
approximate subspace spanned by the columns of X from a 
sketch Y = <E>X of the data. Finally, we note that the work 
of Drineas et al. [15] that approximates leverage scores of 
a matrix is similar in nature to ours but does not produce 
approximate singular vectors. 

4. MAIN RESULT 

4. 1 Proposed Algorithm and Estimation Bounds 

Recall the problem setup discussed in Section 2. Our algo- 
rithm for estimating Ex and Vx from the sketched matrix 
Y = $X is stated in Algorithm 1. This algorithm is very 
simple: we simply return the truncated singular values and 
right singular vectors of Y. 



Algorithm 1: Pseudo-code for sketched SVD. 
Input: Sketched matrix Y — $X 

Output: Ex and Vx (estimates of the singular values and 

right singular vectors, respectively, of X) 
(Uy,Zy,Vy)<-$VF>(Y) 
Ex «— Ey 
Vx^Vy 



The computational complexity of our algorithm can be di- 
vided into two parts. The first part concerns the complex- 
ity of computing the sketch Y from $ and X. If both $ 
and X are available at a central processing node, Y can be 
computed simply by multiplying <E> and X; as discussed in 
Section 2, there may be a fast algorithm for doing this, de- 
pending on the structure of <3>. As we have noted, however, it 



is also possible to compute the sketch column-by-column by 
applying "3? separately to each column of X; when data is col- 
lected in a distributed fashion, this may be the natural way 
to construct a sketch. Let us denote the computational com- 
plexity computing Y as T\(m, N, n). In distributed scenar- 
ios, we will have Ti(m,N,n) = nT'(m,N), where T'(m,N) 
denotes the computational complexity of one matrix-vector 
multiplication with <!>. 

The second part concerns the complexity of computing the 
SVD of Y. Using standard techniques, computing the SVD 
of an m x n matrix with m > n requires 0(mn 2 ) opera- 
tions. (When k is very small compared to n, we may have 
m < n and the SVD of Y can be computed even more effi- 
ciently than this.) Combining this fact with the bound on 
m provided in our main result (see (3)) and assuming /(e) is 
quadratic in e, the computational complexity of this second 
part will be 0(n 2 fce -2 (log(l/e) + log(l/<5)), where 8 denotes 
the failure probability. One can add this cost to Ti (m, N, n) 
to determine the overall computational complexity, although 
it is important to stress that the computation of Y may be 
streaming or distributed over many sensors, while the com- 
putation of the SVD of Y may be performed all at once at 
a central computing node. 

We are now ready to state our main result. 



Proof: See Section 4.2. 



Corollary 1. When $ is generated randomly from an 
i.i.d. subgaussian distribution (suitably scaled) or some other 
random distribution satisfying the distributional JL property 
with quadratic /(•), the bounds in Theorem 1 will hold with 
m = C(fce- 2 (log(l/e) + log(l/<5)). 

This result states that from Y we can obtain accurate rel- 
ative estimates for the singular values of X, and in some 
cases we can obtain accurate estimates for the right singular 
vectors of X as well. However, we struggle to achieve high 
accuracy in the singular vectors when the corresponding sin- 
gular values are close. This is a consequence of well-studied 
perturbation theory (recall the role that the relative gap 
played in Section 3.2) and seems inherent in our approach. 

We note that, naturally, we could obtain similar results for 
preserving the left singular vectors of X were we to sketch 
its rows, rather than its columns. We also note that when 
the exact rank k of the data matrix is unknown (or if the 
rank of X is not necessarily below n), substituting n for k in 
the measurement bound (3) yields a guarantee that applies 
to any N x n matrix X with N > n. 



Theorem 1. Let X be an N x n matrix with N > n 
and rank(X) = k < n, and let X = Ux^xVx denote the 
truncated SVD of X as explained in Section 2. Let t £ (0, 1) 
denote a distortion factor and S £ (0, 1) denote a failure 
probability, and suppose $ is an m x N random matrix that 
satisfies the distributional JL property with 

fclog(42/ e ) + log(2/5) 



m > 



f(e/V2) 



(3) 



Let Y = $X denote the sketched matrix, and let Ex = Ey 
and Vx = Vy denote the estimated singular vectors and right 
singular values of X returned by Algorithm 1. Then with 
probability exceeding 1 — 5, rank(Y) — k and both of the 
following statements hold: 

1. (Preservation of singular values) For all j = 1, . . . , k, 

(1-e) 1 ' 2 < ^£ < (l + e) 1/2 , 

where o\ > ■ ■ ■ > a k > denote the singular values of 
X ( the diagonal entries of Ex ) and a[ > • ■ • > cr' k > 
denote the singular values of Y ( the diagonal entries 
oftx). 

2. (Preservation of right singular vectors) For all j = 
1, . . . ,k, 

\\vj -v'jh < 



V2, 



— ■ max 



min {\<Ti -o-j(l + ce)|} 

o6[-l,l] 



where v\ , . . . , vu denote the right singular vectors of X 
(the columns of Vx) and v[,...,v' k denote the right 
singular vectors ofY (the columns ofVx)- 



4.2 Proof of Theorem 1 

In this section we prove Statements 1 and 2 within Theo- 
rem 1. As noted in Section 2, it will be useful for us to 
write the truncated eigendecompositions of X T X and Y T Y 
as X T X = V x AxV£ and Y T Y = V Y A Y V?, respectively, 
where Ax = diag(Ai, . . . , Afc) and Ay = diag(A'i, . . . , 



Recall that Aj = a, and A' = er' for j = 1, 



4.2.1 Proof of Statement 1 

In order to prove Theorem 1, we require the following result, 
adapted from Lemma 5.1 in [4] and Theorem 4.3 in [13]. 



Lemma 1 ( [4,13]). Let X denote a k-dimensional sub- 
space of ~R N . Let e £ (0,1) denote a distortion factor and 
5 £ (0, 1) denote a failure probability, and suppose $ is an 
m x N random matrix that satisfies the distributional JL 
property with 



m > 



fclog(42/e)+log(2/5) 



f(e/V2) 

Then with probability exceeding 1 — 5, 



VI - e|M| 2 < \\$xh < VT+e| 



X 2, 



for all x £ X . 



To see how this lemma can help us guarantee the preserva- 
tion of the singular values of X (or, equivalently, the eigen- 
values of X T X), we begin by noting that 

Y T Y = V T $ T $V = VxZxUx^QUxZxVx ■ 

We define a new k x k matrix 

M := VxY T YV x = E x !7|$ T $feEx 



and represent its eigendecompostion as M = VmAmVm. 
Noting that Y T Y = Vy^Vy \ we have 

M = VxY T YV x = VxVy^yVyVx. 

From this we can infer that Am = Ey, i.e., M has the same 
set of eigenvalues as Y T Y . Thus, we turn our attention to 
proving that the eigenvalues of M approximate the eigen- 
values of X T X. Let us define A$ := $ T $ — I and consider 
the ratio 

x T Mx i T Sxf/|$ T $C/ x Sxi 



x^ S x 



x^ S x 



1 + 



x T Y,xUxA<s.U x T'xx 

X^ S ^ X 



We will be interested in the range of values that the fraction 
(x T Mx)j (x T T, x x) can take over all nonzero x £ R fc . We 
note that for any vector a; £ R* we can associate a vector 



V ■ 



: and write 

x T Mx 



X^ $j y X 



1 + 



y T Ull\zUxy 



v T y 



To bound the range of values that this quantity can take, it 
suffices to consider all vectors y £ R* with unit norm. Thus, 
we focus on the quantity 

1 + y T U£A*Uxy = 1 + y T u5($ T $ - I)U x y 

= y T U^ T ^U x y = \\3>U x y\\l. (4) 

Our next step is to apply Lemma 1 on the subspace X = 
colspan([/x), using the fact that ||J7x2/||2 = ||y||2 = 1. This 
tells us that with a probability of at least 1 — 5, 

l-e< ||$t/x2/]|2 <l + e, 

k 



holds for all unit norm vectors y £ 
inequality with (4) we get 

-e < y T UlA*U x y < e, 
which implies that for any nonzero x £ R fc 

. x T Mx ^ . 



Combining this 
(5) 

(6) 



In order to complete the proof we use the following lemma, 
which is a simplification of Lemma 1 in [5] . 

Lemma 2 (Lemma 1, [5]). Let H be a diagonal matrix 
and suppose SH has the property that for all nonzero x, 



x T (H + SH)i 



t T H2 



< 9u, 



where < gi < g u - Then 



\,(H + SH) 

91 - xm ^ 

for alii, where \i(Z) denotes the ith eigenvalue of the matrix 



Applying Lemma 2 to (6) with H = Y? x (which is diagonal) 
and H + SH — M completes the proof of Statement 1 of 
Theorem 1 and also implies that rank(Y) = k. 



4. 2. 2 Proof of Statement 2 

In order to prove Statement 2 of Theorem 1, we require the 
following important theorem. 

^Theorem 2 ^Theorem 1, [26]). Let H = UTU* and 
H = H + 6H — UTU* be pxp positive definite matrices. As- 
sume that U and U are unitary and that V — diagfai, . . . , 7 P ) 
and r = diag(ji, . . . ,j p ) are diagonal. Let S = U*U, and 
assume 

r] = \\H~hHH~i\\ < 1, 

where = UT~ 1 / 2 U* . Then for any j and for any set 

2? not containing j we have, 



Ei 



1/2 



< min < 1, max 



1/2—1/2 



iasr Ya - 7j | s/l-r] 



and, in particular, for any i 7^ j , 



\sij\ < min < 1, 



1/2—1/2 

7 1 7j V 



hi -TjI sfi - v 



To prove Statement 2, we continue from the proof of State- 
ment 1. In particular, we suppose (5) holds for all unit norm 
vectors y £ R fe (recall that this event happens with proba- 
bility at least 1 — 8). Our first goal will be to prove that this 
implies that 

(7) 



\UxA$U x \\2 < e. 



To see why this follows, let us for notational simplicity de- 
note A — UxA$Ux- Note that A is a symmetric matrix. 
Then, (5) says that — e < y T Ay < e holds for all unit norm 



V Ay 
V 1 V 



< 



vectors y £ R . Note that this is equivalent to — e < 
e. This fraction is the well known Rayleigh quotient. It can 
be shown that the range of values that the Rayleigh quotient 
takes is confined between the minimum and the maximum 
eigenvalues of A. Let us denote the maximum and minimum 
eigenvalues of A as Q max and a m i n , respectively. Since equa- 
tion (5) says that the Rayleigh quotient is in between — e and 



e we can infer that — e < a n 

\\A\\ 2 = max{|a mi „|, \a n 
(7) holds. 



< 



V Ay 

y 1 y 



< Qn 



< e. Thus, 



} < e and so we have proved that 



To quantify \\vj — i?,- [| 2 , we look at a different yet equivalent 
quantity that may simplify the problem. Let us again look at 
the matrix M that we introduced in the proof of Statement 
1. We have seen that there is a close connection between 
the eigenvalues of M and those of Y T Y . We now show that 
in order to prove that the eigenvectors of Y T Y approximate 
those of X T X, it suffices to study the eigenvectors of M. 



Remembering that 



M 



v£Y T YVx 



we can see that the eigenvectors of M are closely related 
to the right singular vectors of X and Y . Specifically, we 
have that Vm = V x Vy, and denoting the jth eigenvector 
of M as Vj , it is easy to see that Vj = V x vi . This implies 
that (vj,ej) = (vj,v'j) for j = 1, . . . , k, where represents 
the jth canonical basis vector. Furthermore, we note that 



colspan(W) = rowspan(y) = rowspan(X) since every row 
in Y is a linear combination of the rows in X and since 
we have argued above that rank(Y) = rank(X). From this 
(and the fact that v'j £ colspan(W)) it follows that ||u/||a = 
HVx^lla = 1- Now, using the relation (vj,ej) = {vj,v'j} 



and the facts that \\vj\\2 = 
we see that \\vj — e,||a = llu,- 



: \\Vj\\2 = \\Vj\\2 = 1, 

To make sense out of 



the quantity \\vj — e j 1 1 2 , let us examine the expression M = 
Ex + Ex[^xA$i7xEx ■ We can view M as the sum of a diag- 
onal matrix T, x and a perturbation matrix Exf/x A${/xEx- 
The eigenvectors of Y? x are the canonical basis vectors ej. 
Therefore, the quantity \\vj — e j 1 1 2 reflects the amount of per- 
turbation in the eigenvectors of M. This is why, to bound 

IK'j - v 'ih 
of M 



Vj\\2, it suffices to focus on the perturbation analysis 



We apply Theorem 2 as follows: As we have discussed, we 



will quantify \\vj 



|| 2 via \\vj — e j 1 1 2 - Let us set the original 



unperturbed matrix as H — Y, x and the perturbation to this 
matrix as SH = ExJ/x A$(7xEx, such that 

H = H + SH = Ex (J + UxAzU x )Zx = M, 

and both H and H are k x k. Clearly, H is positive definite 
since it is a diagonal matrix with all positive entries along 
the diagonal (because rank(A) — k). To check that M is 
positive definite, note that M = Ex[/x$ T $(7xSx is of the 
form M = B T B, where B = $C/xEx is an m x k matrix 
with m > k. The fact that B has full column rank will 
follow because all diagonal entries of Ex are nonzero (again, 
because rank(X) = k) and because in the proof of Statement 
1 we applied Lemma 1 on the subspace X = colspan([/x). 
Because B has full column rank, M will be positive definite. 

We further have that 
77 = \\H-iSHH-i\\ a = HE^MTE^Ha = ||L#A*l7jr||a. 

Then, applying (7), 77 = ||[7j£A$[/x||a < e. Let us set S = 
I T Vm = Vm and denote the jth eigenvalue of M as \j. 
Then, straightforward application of Theorem 2 yields 

/ \ V2 



1 



< min < 1 , max ■ 



1/2 



'/ 



< min < 1 , max ■ 



<Ji\ 



1/2 



W k?-A 3 -| VT^7 

As we have discussed in Section 2 we assume that Sj 
= (vj-.v'j) > 0. Then, 



\m - v j\\2 



•2{iT J ,e i > + ||e 



2 

J I! 21 



= V2a/1 - s 



33 ■ 



V2 



< V2. 



N 1_ N 



1 — min < 1, max 



Tl/2 

<V2min<l, max- 



where the last inequality is due to the fact that 1 — vl — x < 
a; for < x < 1. 

To write the above only in terms of the unperturbed singular 
values, (Tj, we make use of Statement 1 and the fact that 



Aj = for i = 1- • 



, k to obtain 



s/T—t 



max 



^/2o 



min {la, 



a 2 (l + ce)|} 



5. APPLICATION TO SPECTRAL ANALY- 
SIS OF STREAMING GRAPHS 

In this section, we apply our data analysis framework to 
streaming graphs, a model of data collection where edges of 
a graph are updated dynamically. We consider a scenario in 
which edges are inserted and deleted over an observation pe- 
riod, and our goal is to maintain a small data structure that 
encodes the graph information so that we may analyze the 
spectrum of the graph quickly at any point during or after 
the sequence of edge updates. Spectral graph analysis has a 
multitude of applications including graph embedding, graph 
isomorphism testing, data clustering/segmentation (of which 
there are yet many more applications!), numerical linear al- 
gebra, etc. We refer the reader to just a few in [6, 8, 27, 31]. 
Determining the spectrum of a graph is at the heart of many 
modern data analysis and graphical information processing 
algorithms. 

Let G = (V, E) be a graph with vertex set V and undirected, 
unweighted edges E. Let A denote the symmetric binary 
adjacency matrix of G, denote by d v the degree of a vertex 
v G V , and define the graph Laplacian as 

fd v if u = v 
— 1 if u and v are adjacent 
otherwise. 

There is a compact definition of Lq using the adjacency 
matrix: Lq ~ diag(du) — A. 

Let X be the incidence matrix of the graph G. This matrix 
has TV = \E\ rows and n = \V\ columns and to define each 
entry of X, consider an edge (it, v) between vertices u and 
v. Since the graph is undirected, the ordering of the vertices 
is chosen arbitrarily. Then, 



X, 



V 1 



L (u,j)),u — 1 an d 

It is well-known that the rank of the graph is the rank of the 
incidence matrix and that this value is \V\ — c where c is the 
number of connected components in G. If the graph G is 
weighted, we replace the ±l's with the appropriate weights 
in the incidence matrix. 

From the definitions of the graph Laplacian and the inci- 
dence matrix, it is clear that Lq = X T X. The singular 
values of X are, therefore, related to the eigenvalues of Lg 
in a straightforward fashion: 



<Ti{X) = ^f\~(LG). 



fJxExVj 



Furthermore, the right singular vectors V of X 
are the eigenvectors of the Laplacian. Thus, it is sufficient 



to compute (good) approximations to the singular values 
and the right singular vectors of X to obtain (good) ap- 
proximations to the top eigenvalues of Lq and the corre- 
sponding eigenvectors. From standard spectral graph the- 
ory, we know that the eigenvalues A; of the Laplacian satisfy 
Ai > • • • > A n = and, with the assumption that G has c 
connected components, 

Al > > \„- c > A ?l _ c .f i = = A n = 0. 

In particular, rank(X) = rank(Lc) = n — c. 

Next, we define the streaming graph model. Following [1, 2], 
we define a dynamic graph stream as a stream of edge up- 
dates (both insertions and deletions). This is a faithful 
model of the evolution of an online social network, for ex- 
ample, in which users connect and disconnect to other users 
over time [22]. 

Definition 2 (Dynamic graph stream). A stream S = 
(ai, . . . , or) where at = (jt, kt, At) £ [n] x [n] x R defines a 
weighted graph G = (V, E) where V = [n] and the weight of 
an edge (j, k) is given by 

A(j,k)= Yl At - 

t-(jt,kt) — (j,k) or (k,j) 

We assume that at any update time t, the adjacency matrix 
A is well-formed; that the edge weight is non-negative; and 
that the graph has no self-loops. 

In this prototype application, the stream of edge updates S 
defines the edge-vertex incidence matrix X of the graph G. 
The matrix X has N = (™) rows and n columns, and for 
each stream item, we update two entries in X as 

X Ut,kt),H = X Ut,kt),3t + At, 

X UtM)M = x (j%M)M ~ 

We collect sketches of each column of X and aggregate them 
into a matrix Y . Denoting the jth column of X by Xj, the 
jth column of the sketched matrix is given by yj = &Xj. We 
can update the sketch in a streaming fashion. Upon receipt 
of a stream item (u t , Vt, At), we update y ut and y Vf : 

Uut = yu t + A t (t> (uuvt) , 
where <f>j denotes the jth column of 

Our main result, Theorem 1, tells us that a sketch of the ma- 
trix X is sufficient to recover information about its singular 
value decomposition. 

Corollary 2. Assume that the undirected, weighted graph 
G — (V, E) is presented in a streaming fashion so that its 
incidence matrix X has n = \V\ columns, N = (™) rows, 
and rank k < n— 1. Let e £ (0, 1) denote a distortion factor 
and S € (0, 1) denote a failure probability, and suppose $ is 
an m x N random matrix that satisfies the distributional JL 
property with 

m ^ fclog(42/ e )+log(2/tf) 



Let Y = $X denote an m x n sketch of X maintained in 
the streaming graph model, and let Ex = Ey and Vx — 
Vy denote the estimated singular vectors and right singular 
values of X returned by Algorithm 1. Then with probability 
at least 1 — 5, the following statements hold: 

1. (Preservation of eigenvalues) For all j — 1, ... , k, 

A' 

1 - e< < 1 + e 

where \j denote the true eigenvectors of the graph Lapla- 
cian Lq and \'j denote the estimated eigenvalues ob- 
tained by squaring the diagonal entries of T,x ■ 

2. (Preservation of eigenvectors) For all j = 1, . . . , k, 
\\vj - Villa < 

mm j^' TT^^ ^mm^-A^l + c.)!} !' 

where Vj are the eigenvectors of the graph Laplacian 
Lq and v'j denote the estimated eigenvectors obtained 

from the columns of Vx ■ 

Because the adjacency matrix A of G has at most |V| 2 non- 
zero entries, this result is useful only when the rank k of Lg 
is significantly smaller than n — \V\, the number of vertices; 
or, equivalently, when G has many connected components. 
In this case, the size of the sketch is smaller than that of 
the adjacency matrix. In summary, for highly disconnected 
graphs presented in a streaming fashion, we can recover the 
approximate eigenvalues and eigenvectors of the Laplacian. 
The sparsity of the matrix $ and the speed with which we 
can update the sketch matrix Y under a stream of updates 
are functions of the quality of the JL transform. The struc- 
tural evolution of online social networks [22] suggests that 
it is reasonable to assume that the underlying graph has a 
significant fraction of vertices in small, disconnected compo- 
nents so that the graph is essentially a low-rank graph. 

6. CONCLUSION 

We present a data collection and analysis scheme that per- 
mits the distributed collection of data X by resource con- 
strained sensors in a network and the central computation 
of the spectral decomposition of X T X or the right singu- 
lar vectors of the data itself. The algorithm returns not an 
approximation to the original data, but a good approxima- 
tion to the singular values Oj and the right singular vectors 
Vj of the data. This data collection and analysis framework 
makes a small number of linear, non-adaptive measurements 
of the data. The number of measurements each sensor makes 
is comparable to the rank of the data and, if the data are 
full rank, the number of measurements at each sensor is 
comparable to the total number of sensors. This efficient 
data collection is especially important for sensors that are 
severely resource constrained and cannot store or transmit 
a large amount of data to a central device; we believe that 
one possible application of such an algorithm would be in 
operational modal analysis of structures (buildings, bridges, 
etc.) [28]. 
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